Visualize properties on map.

Examine the missing pattern

Lots of features contain high percentage of NAs. Need to deal with these features later.

Examine the distribution of log error

## Log error are normally distributed. For those properties with log error 2 SDs above the mean, I labeled them as overestimated outliers, for those properties with log error 2 SDs below the mean, I labeled them as underestimated outliers. I examined the latitude and longitude distributions of these outliers.

Examine distribution of # of transactions by day and absolute mean logerror by day.

Examine the variation in log error across various discrete and continuous features. Found some interesting pattern with yearbuilt, number of bedrooms, number of bathrooms, fireplacecnt, garagecarcnt, Number of Units,propertylandusetypeid

I group the features into 3 groups: features related to house physical features(house_features), features related to tax (tax_features),and features related to geo-location (geo_features). I made some correlation plots to examine each feature’s correlation with log error and absolute log error within each group.

No feature seems to have strong correlation with either log error or absolute log error